Search results for "combinatorics on words"

showing 10 items of 49 documents

Languages with mismatches

2007

AbstractIn this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S an…

Combinatorics on wordsApproximate string matchingGeneral Computer ScienceRepetition (rhetorical device)String (computer science)Search engine indexingComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Approximate string matchingData structureTheoretical Computer ScienceCombinatoricsSet (abstract data type)Formal languagesCombinatorics on words Formal languages Approximate string matching IndexingIndexingWord (group theory)MathematicsInteger (computer science)Computer Science(all)Theoretical Computer Science
researchProduct

The Shuffle Product: New Research Directions

2015

In this paper we survey some recent researches concerning the shuffle operation that arise both in Formal Languages and in Combinatorics on Words.

Star-free languageComputer scienceProgramming languageComputer Science (all)Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)computer.software_genreIntermixed languageTheoretical Computer ScienceCombinatorics on wordsTheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITYProduct (mathematics)Formal languageShuffle squarecomputerShuffle
researchProduct

A NEW COMPLEXITY FUNCTION FOR WORDS BASED ON PERIODICITY

2013

Motivated by the extension of the critical factorization theorem to infinite words, we study the (local) periodicity function, i.e. the function that, for any position in a word, gives the size of the shortest square centered in that position. We prove that this function characterizes any binary word up to exchange of letters. We then introduce a new complexity function for words (the periodicity complexity) that, for any position in the word, gives the average value of the periodicity function up to that position. The new complexity function is independent from the other commonly used complexity measures as, for instance, the factor complexity. Indeed, whereas any infinite word with bound…

Average-case complexityDiscrete mathematicsFibonacci numberSettore INF/01 - InformaticaGeneral Mathematicscomplexity functionComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Function (mathematics)periodicitycritical factorization theoremCombinatoricsComplexity indexCombinatorics on wordsBounded functionComplexity functionComputer Science::Formal Languages and Automata TheoryWord (computer architecture)Combinatorics on wordMathematicsInternational Journal of Algebra and Computation
researchProduct

"Indexing structures for approximate string matching

2003

In this paper we give the first, to our knowledge, structures and corresponding algorithms for approximate indexing, by considering the Hamming distance, having the following properties. i) Their size is linear times a polylog of the size of the text on average. ii) For each pattern x, the time spent by our algorithms for finding the list occ(x) of all occurrences of a pattern x in the text, up to a certain distance, is proportional on average to |x| + |occ(x)|, under an additional but realistic hypothesis.

CombinatoricsCombinatorics on wordsPattern recognition (psychology)Search engine indexingAutomata theoryHamming distanceString searching algorithmApproximate string matchingTime complexityMathematics
researchProduct

On the Shuffle of Star-Free Languages

2012

Motivated by the general problem to characterize families of languages closed under shuffle, we investigate some conditions under which the shuffle of two star-free languages is star-free. Some of the special cases here approached give rise to new problems in combinatorics on words.

Discrete mathematicsAlgebra and Number TheorySettore INF/01 - Informaticapure submonoidGeneral problemAbstract family of languagesRegular languageComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Star (graph theory)star-free languageCone (formal languages)shuffle of languagePumping lemma for regular languagesTheoretical Computer ScienceCombinatorics on wordsComputational Theory and MathematicsRegular languagecombinatorics on words.Information SystemsMathematicsFundamenta Informaticae
researchProduct

Block Sorting-Based Transformations on Words: Beyond the Magic BWT

2018

The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression and later results have contributed to make it a fundamental tool for the design of self-indexing compressed data structures. The Alternating Burrows-Wheeler Transform (ABWT) is a more recent transformation, studied in the context of Combinatorics on Words, that works in a similar way, using an alternating lexicographical order instead of the usual one. In this paper we study a more general class of block sorting-based transformations. The transformations in this new class prove to be interesting combinatorial tools that offer new research perspectives. In particular, we show that all the tra…

0301 basic medicineSettore INF/01 - InformaticaComputer scienceData_CODINGANDINFORMATIONTHEORY0102 computer and information sciencesBlock sortingData structureLexicographical order01 natural sciencesUpper and lower bounds03 medical and health sciencesCombinatorics on words030104 developmental biology010201 computation theory & mathematicsArithmeticCompressed Data Structures Block Sorting Combinatorics on Words AlgorithmsData compression
researchProduct

Balancing and clustering of words in the Burrows–Wheeler transform

2011

AbstractCompression algorithms based on Burrows–Wheeler transform (BWT) take advantage of the fact that the word output of BWT shows a local similarity and then turns out to be highly compressible. The aim of the present paper is to study such “clustering effect” by using notions and methods from Combinatorics on Words.The notion of balance of a word plays a central role in our investigation. Empirical observations suggest that balance is actually the combinatorial property of input word that ensure optimal BWT compression. Moreover, it is reasonable to assume that the more balanced the input word is, the more local similarity we have after BWT (and therefore the better the compression is).…

Discrete mathematicsGeneral Computer ScienceBurrows–Wheeler transformCombinatorics on wordsPalindromeComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Binary alphabetTheoretical Computer ScienceCombinatorics on wordsData compressionEntropy (information theory)Combinatorics on words; Burrows–Wheeler transform; Data compressionArithmeticCluster analysisEmpirical evidenceBurrows–Wheeler transformComputer Science::Formal Languages and Automata TheoryMathematicsData compressionComputer Science(all)
researchProduct

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

2021

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as $r$) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the $r$ parameter as measure of repetitiveness, especially to evalua…

FOS: Computer and information sciencesBurrows–Wheeler transformSettore INF/01 - InformaticaCombinatorics on wordsFormal Languages and Automata Theory (cs.FL)Computer scienceString (computer science)Search engine indexingCompressed data structuresComputer Science - Formal Languages and Automata TheoryString indexingData structureMeasure (mathematics)Burrows-Wheeler-TransformRepetitivenessCombinatorics on wordsBurrows-Wheeler-Transform Compressed data structures String indexing Repetitiveness Combinatorics on wordsTransformation (function)Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)AlgorithmData compression
researchProduct

STURMIAN WORDS AND AMBIGUOUS CONTEXT-FREE LANGUAGES

1990

If x is a rational number, 0<x≤1, then A(x)c is a context-free language, where A(x) is the set of factors of the infinite Sturmian words with asymptotic density of 1’s smaller than or equal to x. We also prove a “gap” theorem i.e. A(x) can never be an unambiguous co-context-free language. The “gap” theorem is established by proving that the counting generating function of A(x) is transcendental. We show some links between Sturmian words, combinatorics and number theory.

CombinatoricsDiscrete mathematicsRational numberCombinatorics on wordsNumber theoryContext-free languageComputer Science (miscellaneous)Generating functionSturmian wordNatural densityTranscendental numberMathematicsInternational Journal of Foundations of Computer Science
researchProduct

Enumeration and Structure of Trapezoidal Words

2013

Trapezoidal words are words having at most $n+1$ distinct factors of length $n$ for every $n\ge 0$. They therefore encompass finite Sturmian words. We give combinatorial characterizations of trapezoidal words and exhibit a formula for their enumeration. We then separate trapezoidal words into two disjoint classes: open and closed. A trapezoidal word is closed if it has a factor that occurs only as a prefix and as a suffix; otherwise it is open. We investigate open and closed trapezoidal words, in relation with their special factors. We prove that Sturmian palindromes are closed trapezoidal words and that a closed trapezoidal word is a Sturmian palindrome if and only if its longest repeated …

FOS: Computer and information sciencesFibonacci numberSpecial factorGeneral Computer ScienceFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheoryEnumerative formulaDisjoint sets68R15Theoretical Computer ScienceFOS: MathematicsPalindromeMathematics - CombinatoricsClosed wordsFibonacci wordMathematicsDiscrete mathematicsClosed wordSequenceta111Sturmian wordPrefixCombinatorics on wordsRich wordtrapezoidal wordF.4.3Combinatorics (math.CO)SuffixWord (group theory)Computer Science(all)
researchProduct